Linear pattern matching on sparse suffix trees

نویسندگان

  • Roman Kolpakov
  • Gregory Kucherov
  • Tatiana A. Starikovskaya
چکیده

Packing several characters into one computer word is a simple and natural way to compress the representation of a string and to speed up its processing. Exploiting this idea, we propose an index for a packed string, based on a sparse suffix tree [8] with appropriately defined suffix links. Assuming, under the standard unit-cost RAM model, that a word can store up to logσ n characters (σ the alphabet size), our index takes O(n/ logσ n) space, i.e. the same space as the packed string itself. The resulting pattern matching algorithm runs in time O(m + r + r · occ), where m is the length of the pattern, r is the actual number of characters stored in a word and occ is the number of pattern occurrences.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Suffix Trees and Suffix Arrays

Iowa State University 1.1 Basic Definitions and Properties . . . . . . . . . . . . . . . . . . . . 1-1 1.2 Linear Time Construction Algorithms . . . . . . . . . . . . . 1-4 Suffix Trees vs. Suffix Arrays • Linear Time Construction of Suffix Trees • Linear Time Construction of Suffix Arrays • Space Issues 1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

متن کامل

Sparse compact directed acyclic word graphs

The suffix tree of string w represents all suffixes of w, and thus it supports full indexing of w for exact pattern matching. On the other hand, a sparse suffix tree of w represents only a subset of the suffixes of w, and therefore it supports sparse indexing of w. There has been a wide range of applications of sparse suffix trees, e.g., natural language processing and biological sequence analy...

متن کامل

On-Line Linear-Time Construction of Word Suffix Trees

Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Sparse suffix trees are kind of suffix trees that represent only a subset of suffixes of the input string. In this paper we study word suffix trees, which are one variation of sparse suffix trees. Let D be a dictionary of words and w be a string i...

متن کامل

An Estimation of the Size of Non-Compact Suffix Trees

A suffix tree is a data structure used mainly for pattern matching. It is known that the space complexity of simple suffix trees is quadratic in the length of the string. By a slight modification of the simple suffix trees one gets the compact suffix trees, which have linear space complexity. The motivation of this paper is the question whether the space complexity of simple suffix trees is qua...

متن کامل

Sparse Directed Acyclic Word Graphs

The suffix tree of string w is a text indexing structure that represents all suffixes ofw. A sparse suffix tree ofw represents only a subset of suffixes of w. An application to sparse suffix trees is composite pattern discovery from biological sequences. In this paper, we introduce a new data structure named sparse directed acyclic word graphs (SDAWGs), which are a sparse text indexing version ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1103.2613  شماره 

صفحات  -

تاریخ انتشار 2011